首页> 外文OA文献 >The weighting is the hardest part: On the behavior of the Likelihood Ratio Test and the Score Test under a data-driven weighting scheme in sequenced samples
【2h】

The weighting is the hardest part: On the behavior of the Likelihood Ratio Test and the Score Test under a data-driven weighting scheme in sequenced samples

机译:权重是最难的部分:在数据驱动的加权方案下,序列样本中的似然比检验和得分检验的行为

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency–functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. The two tests have equal power, if the weights in the set included weights resembling the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia - the PRRC2A (p = 1.020e-06) and the VARS2 (p = 2.383e-06) - in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances, the score test is the most powerful test. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.
机译:随着外显子组测序数据可用性的提高,基于序列的关联研究正处于关键的转折点。流行的关联测试是序列内核关联测试(SKAT)。权重嵌入在SKAT中,以反映变体对特征方差的假设贡献。由于真正的权重通常是未知的,因此容易出错,因此我们研究了数据驱动权重方案的效率。我们建议使用一套理论上可行的加权方案,我们假设其中一种方案能提供最大的测试统计量,可能会最好地捕捉等位基因频率与功能效应之间的关系。我们表明,使用替代权重可以消除施加任意频率阈值的需要。由于评分测试和似然比测试(LRT)可能都在这种情况下使用,并且功效可能有所不同,因此我们对这两种测试的行为进行了表征。如果集合中的权重包括与正确的权重相似的权重,则这两个测试具有相等的功效。但是,如果权重指定不当,则LRT将显示出更高的功率(由于其对错误指定的鲁棒性)。通过此数据驱动的加权程序,LRT在瑞典精神分裂症中已证实与精神分裂症相关的区域中的基因中检测到显着信​​号-PRRC2A(p = 1.020e-06)和VARS2(p = 2.383e-06)。具有外显子组测序数据的11040名患者的病例对照队列。分数测试目前因其计算效率和功能而被首选。确实,在某些情况下,假设规范正确,分数测试是最有效的测试。然而,LRT具有在重量错误指定的情况下通常更坚固和更强大的有利特性。鉴于可以说错误指定的模型可能是规则而不是基于加权的方法的例外,因此这是一个重要的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号